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ITEM CHARACTERISTIC CURVE PARAMETERS: 
EFFECTS OF SAMPLE SIZE ON UNEAR EQUATING 



' L INTRODCcnON ^ • 

. ■ , ^- • . • 

The application of the technology of computer driven adaptive* testing requires iht 
development of large banks of> test items. Each bank may contain 250 to 400 items/ and all must 
measure the same* ability on the same metric or scale. It is unreasonable and impracticable- to 
assemble a single ^oup of 2,000 subjects for 250 to 400 minutes to try all the items; therefore,^ 
a method for linking* together subsets of items administered to varying groups must be 
investj|ited. Item Characterbtic /Curve (ICC) theory offers a uhique method of linking subsets of 
test items due to the invariance property of the ICC parameters.. This invariance property xests on 
the two major theoretical assumptions of latent-trait theory: (a) unidim^nsionality^ and (b) local 
independence. Unidimensionality means that only a single ability is bd^g measiflred and is assumed 
to be the property of an item pool, even when assembled into subs^. Local independence means 
that the subjects' responses to an item are independent of the responses to ^ another item. More 
simply put, this means that the item f|6ponse is a function of ability and no 6therT^ctM«,:4n 
effect, this is a restatement of the unidimensionality assumption. If an item pool is 
unidimensiQnal, then any shift in score metric that is due to a linear transformation may be 
corrected or adjusted by application of the proper complementary . linear transformation. This is 
what is meant by the idea that latent-trait parameters are invariant to a linear transformation, arid 
it'is this theoretical property that allows ^em pools to be litiked and transformed to a comtnon 
metric, ^n previous rescardi efforts, item pools have been linked via the method of linear equating 
(^e Lord, 1977; Ree, 1977; Sympson & Ree, in press) with ap^jarent success. To dat^ there has 
been little research on the efficacy of these linking procedures and the effects of . errors in ^ ICC 
parameter estimation on the ir (linear ly) transformed values. . 

ICC Parametefs 

* 

The three parameter logistic mddd of Birnbaum (Lord *& Novick, 1968) is the most 
frequently used for relating item responses to subjects'. at)ility: The three paramfeters, fl. 6. and c, 
are item discriminating item 'difficulty (or- location), and probability of chance success (or lower 
asymptote), respectively. . ^ 

• ' ' . '. * ■ • * 

The curve, described by these parameters takes the shape of an ogive (cumulative frequency) 
or' an V with the upper asymptote approachirig' a probability of 1.0 and usually a 'lower 
asymptote of ' a probability greater than 0,0. The ogive describes the probability of obtaining a 
correct answer to an item* as a mondtonic. increasing function of ability. * P 

NThe item discrimination parameter, fl. is.a function of the slope of the' ICC and generally 
rangesXfrorvi .5 to about 2.5. Hie value of a equal to about 1.0 is typical of niany test items, 
while values below i are insufficiently discriminating for most testing purposes, and a values^ 
above 2.0l are infrequerttly found. 

^e atem difficulty parameter, ft^describes. the point of inflection of the ICC and is usually 
scaled between -2.5 .and +2.5, aljheitglr the metric is arbitrary. 

The item guessing parameter, c,. is the lower asymptote of. the ICC and is generally 

' conceived as the probability of selecting the correct itemoption by chance alone. Most test items 

have c pai^ameters greater than 0.0 and less than or equal to .30. 

\ ♦ ■ ■ ' ■ ' ' ' . . ' 

Figure 1 shows three ICCs. The horizontal axis is scaled in units of ability d and the 

vertical axiis is the probability of answering the item correctly. The solid curved line shows an ICC 

' ■ ' - V . • 
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Figure I. Item characteristic curves. 
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for an item of average difficulty with acceptable discrimination and the lower asymptote 
appropriate for a five-item multiple-choice item. The dashed line ^ows an item of identicid 
difficulty, c' value of .28, but with a lower a value. Note how the slope pf the curve is Jess 
steep. Ilie third curve, dot-dash line, shows ah item with a/ti^valUe of .30, an a parameter of 1.0, 
and the ^ parameter equal to 1.0. As the 6 parameter 'changes, the location of the inflection 
point of th^ curve is displaced along the horizontal axil. . . 

Equation 1 present! the mathematical function descpbing the curve. ' \- ' 



P(«)i 



c. + (1 -c.)(l +e 



(-1.7fl.(0-6))-l 

' ^ .1 



(1) 



Previous resear(5h (Ree, 1978) indicates that the ICC parameters may be estimated . with some" 
reasonable degree of accuracy, providing a sufficient sample* of examinees with an appropriate 
distribution of abUity, 6 is available: . * ■ , ' 

linlcing Pmdigms , * * , 

Two fundamental linking procedures may be defined and are known as the Andior Items 
Method ^ (^FM) and the Andior, Subjects Method (ASM). In AIM, ev^ry subset of items is 
administered to a different sample ^ of -subjects, hut embedded into -the group of items tote 
ihalyzed is a' common (or anchor) set of items. During analysis, fhe anchor items are identified, 
and the following linear transformation is applied to the resultant ICC parameters: <i . 



(2) 



Where b is the item location parameter transformed to the desired scale and sb^ and sb2 are 
standard deviations of the desired scale and observed scale respectively. A similar procedure for th^ 
a parameter is defined by « 





sb^ 



(3) 



Where is the item discrimination parameter transformed to the desired scale, is the observed a 
parameter, and sb 'and sb^ are as in equation (2). Because the c parameter is measured^ on the 
probability axis, it does not change and no transformation need be applied. 

The ASM requires tha^tl^ same group of subjects be available to take each subset of items. 
It is extremely unlikely that Jhe same 2,000 subjectsr could be assembled to take jtems over a 
long period of time as would be required to place tests on tl^^ame metric from year to year. 
For this reason, the ASM method seems less likely to find long-term practical application. Because 
of its potential for u$e, the AIM procedure is the subject of the present study. 



II. afTHOD 



In order to. have a known standard for referenpe/ a simulation study v/as run using two 
groups of subjects, a single set of 20'ancho/ items and two differing groups of 60 experimental,. 
oi non-anchor, items. These two groups of items were assembled into two, tests designated Tl and 
T2. Both groups of simulated subjects were specified to tiave about the same normal distribution 
of B. Table 1 shows the mean, ^tandard deviation., minimum and maximum of 0 for the groups SI 
and S2. Tliese two groups represent what might be expected If subjects for experimental testing 
were picked 'from some larger pool, such as candidates for mflitary enlistment for example. 
Response vectors for these subjects were generated, on the two tests. 



liable 1. Mean, StanOard De\4atipn, 
Minimum and Maximum of d 
for Groups Si and S2 



Parani«t«r 




GlrouiK, 


SI 






. V -0.0145 


0.0250 


OQ 


h 0.9976 


> r.Q045 


Minimum 


-2.6000 


•: -2.6000 


Maximum „' 


2. '6000 


2.6000 



Generation of Item Responses ^ - 

In order toigencrate a vector of item responses: for each ^'subject'- the d values were used in 
equation (1) to ^ompute the likelihood of /'passing" each item. 

Because Equation 1 yields a number PKfl)^ such that 0.0 < 9(6). < 1 .(j, a number X. is 
drawn from a uniform (rectangular) distribution -tanging from 0.0 to 1.0 and compared to F(d)^. If 
Xjis larger than P(0).. then an incorrect response is specified for the item; otherwise, a correct 
response is specified Tor the it*em. Thus» a "Subject" with P{d). *9Q gets the item correct 9 in- 
to times, and a vector of item responses is devetoped for each "subject*' in each data set. These 
response vectors are then used to ifiv^cstigate the AIM linking procedures. 

Table 2 shovvs the distribution of ICG parameters for the 80 items' /or Test 1 (TJ) and Test 
2 (T2), while Table 3 shows the JCC parameters for the 20 anchor items which are commoir to 
both tests. ' , 

Subjects from Group 1 were administered only the items in Test 1, and subjects from Group 
2^ only the items in Test 2. In order^ to study the effects of sample size, the ICC parameters Were 
estimated on four samples drawn with replacement as follows: 250; 500; 1,000; and 2,000. The 
ICC parameters were estimated on these four sample sizes for both groups. Anchor JCC parameter 
values from the four samples administered Test 1 serve as the Input values for the anchor item 
parameters to the second test. This permitted the four .sizes of calibration' sample (250: 500; 
1,000; 2,000) to be varied and tried out with the four samples used to estimate the anchor item 
ICC parameters. < ' ' : 



Table 2. Means and Standard Deviations 
of tlie Generated Item Puameters for Test 1 (tl) 
and Test 2 (T2) 




T«it 




Piramtttr 


Tl 


T2 




1 .0564 


" 1.0452 




0.2793 


0.2394 


b 


0.0847 


^-0.0559 




0.8442 


0.8577 


c 


0.1878 


0.2017 




0.0542 


0.0474 



Anchor Itim Common to Both tests ' 



JCC FairaifMrttr 



2 

3; 

4 

■5'\ 
-.6/ 

/8 • 
9 

JO . 

11 

12. 

13' 

14 ' 

15 

16 

17 

18 

19 

20 

Mean 

SD 



^000 
.8000 

' i .oooo^• 
•. . i.oobiD- 

< H.2000 
>»-*tloOO/ 
1.200b 
1.3000 
1.4000 
. 14000" 

. r.3000, 
■'. 1.2000' 

1.2000 

1.1600 

1.0000 . 
' , 1.0000 
. .8000 
.8Ci00 
.8^)00 , 



. -1 .5006 

,-t1.3500 



% -,900p 
' A7S00' 
^ -.6000 * 
. r'.45.pO 
■ .-3000 , 
-.ISOO , 
■ -.1500, 

; .3000' 



Asm. 

.6dOT^ 



4 



1.0600 
.2113 



.7500 ' 
,9000 
1.0500- 
1.2500 
1.35i^.. 

' \ .5dpp 

. .0060 
.9549 



.1000 

.1000 . 

.1500 

.1500 

.2000 

.2000 

:2i00 

52000' 

.2000 

.2000 . 
-.2200 
-.2500 
)..2000 

2200 . 

:2200 
f .200b 

:2500 

.2^00 

■ .2500 
. ;2015.. 
.0453 
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ill. RESULTS ' ' ■ 

. . ' .- • ■ » » . « ■ . ' ■ 

f - ^ \ . . . ' ^ ' ■« . . ■ 

Table 4 shows., the intercorrelations between the known item parartieteit and the e^fimjfted 
p^ameters.^ past rqj^rch' indicates (Urry, 1976), the correlations all increase with increasing 
imple size; The correlations in Test 1 for b and estimates ^of 6 ,?tart high at ,952 and . increase 
0 an exceptionally > high 992. Correlations for j ancij estimates of. a begin moderately at .666 and 
^climb to !869,.but^c' correiatiotis of c and 'estiniatAl' c increase from only .031 to .115. hi Test 
2, much the same patterti is observed except that^f correfation of c^d estimated c increases 
from .164 \o .315 as sample size increases. , 
i^- Because correlations are insensitive to constant differences as . might be found , if ipC- 
parameters are overestimated pr underestimated by a constant amount, summed- absolute deviates 
of the <8Jtimatcd parameters from the known parameters were cttmpitted for «ich parameter in 
each sample size. Table 5 presents the surtmed absolute deviations (or summed errors) for both, 
tests with the four ssimplfc sizes. Ffgure I displays this graphically. There is a large drop in 
summed error When the. a parameMr is estimated . Qii progressively larger samples of subjects up to 
and including the- difference between 1,000 and 500 subjects. Between l,dOa anfl. 2,000 subjects, 
the difference in siimmcd trror is/smaUer. Hie relationship between error and Sample size for the 
6 parameter is more nearl^ cortstam. That js,: the Jine on the figutf for .estimates of b is generally 
Waght which- means error tetfds to be reduced in direct proiwrtion^ to the number of subjects. 
Hie almost; flat line /or the- c parameter indicates, that virtually no reduction of error is occurring 
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Table 4. Intercoirdations l»etween Known 
aiyl Estimated ICC Parameters for Both 
Groui» with Varying Sample Sizes^ 



Firam#t«r 


N 






a ; 


250 


1 

.666 


.512 




500 


.671- 


).725^ 




1,000 


.831 


. ,813 






-ovy 


.886 


b ^ 


250 


T952 


' .929 




^ 500 


564 


.962 


i' . 4 


1,000 


. .980 > 


' .979 




2,000 


.992 • 


.987^ 


c 


250 


. .031 


.164 




500 y 
1,000^ 


.035 

1 -on 


.109 
.331 




2,000 


.1.15 ' 


315 



TaliJe 5. Summed Abso lute D eviations (SIEnorl) and Average Absolute 

Deviations (lErrorl) for the Three ICC Parameters, 
for the Two Tests 



T«»t 1 Twt 2 



Paramatar 


N 


£1 Error 1 


ICrrorl 


SICrrorl 


ICrrorl 


a 


, 250 


1^30.6450 


.3831" 


30.5290 


.3816 


a 


<500 


22.8090 


.2851 


20.6910 


.2586 


a 


1,000 


15.7490 


.1969 


16.8910 


. .'2111 


a 


• ,2,000 


-f 15.5980 


, .1950 


15.1390 . 


.1892/ 


b 


^ 250 


23.5050 


. .2938 


20.8470 


.2606 


b 


500 


19.8600 


.2483 


16.6070 ^ 


.2076 


b 


' 1,000 


17.6^90 


.2211 


13.8050 p 


.1726 


* b , 


• .2,000 


- I2'7350 




Ib5130 


.1439 


c 


250 


7.7360 


-.0967 


' 7.2350 


.0904 


c 


500 


7.3600 


.0920 


7.5120 


.0939 


. c 


1,000 


/ 6.9080 


.0864 


7 3180 


.0915 


c 


2,000 


6.4400 


.0805 


• 6.8640 


.0858 



.1 



'0 . 12 



. ^ Sample Size * ^ - 

. ■ . . . ■ ' 

Figure 2. Eiror in Estimation of ICC Parameter. 

' 9 : ... : ■ , 

With increasing sample size Tor that parameter. The av<|fage, abSbli^ de\riation for tfier r parameter 
is almost one-t^ii^ of the entire^ range of the parameter as the .c pararneter is generally estimate<J 
between .00 and .30. However, l)ast research. (Ree, 1979)^indicat^ that, even Jor low ability 
subjects, thp effects of errors in^the estimation of the c pararaetef^re small. 

. Summed deviations of. Icndi/n ICC parameters from the equated value of the ICC para^ieters 
were computed^for the a and 6 parameters for the 16 combinations ^of calibration sample ^size and 
equating sample size. Table 6' shoWs the . summed deviapons and the per item* deviation for. both- 
parameters for *the 16 combinations. The equated a Tpatameter shows' large summed deviations- 
whenever the sample has ' beeri limited to 250 subject^ whether in the calibration or equating 
sample! The lowest error rates for the a parameter occur when the anchor'item values'have been 
estimated on 2,000^ subjects. The effects, of t}ie size of the .calibration' sample are not so clear-cut.. 
When 2,000 subjects are used to estimate the anchor it^enj^JjCP parameters, the magnitude of the 
error is approximately the same for* all calibration sainp^f sizes except 250. With increasing 
calibration sample size, the error ra^e increases* by spme smalfr amount as indicated by the average 
(per item) error. This is an unexpected result and an explanation tp^y be found in the 
relationship between the sets of estimated *a parameters. If the estimated a parameters were all 
estimates of the same value and if the test scale were unidimensional, a basic assumption of the 
theory; then the estimated a parameters should be linear transformations of one another and 
should be correlated 1.0, as correlations are invariant to a linear transformation. This was not 
found to be "the case, and Table 7 shows the intercorrelation of the^ estimated a- parameters. Only 
the correlation between the -estirhate Of a calculated on 1,000 subjects and the estimate 'of a 
calculated |d 2^000 subjects approaches this relationship. This lack of linearity may be^ due to the 
assumption of normality and to the repealing used in the calibration procedure, and these may 
interact tn such a way as to produce the anomalous results. Table 8 sjiows the intercorrelation of 
estimated b parameters. All exceed .900, and the summed .devyj|^s also show a steady decrease 
s sample size increaijes for the b parameter, indicating V^lrtually linear transformation of 



Table 6. Summed Absol ute Deviations (2>|Enror|) and Average Absolute Deviations 
(lErrorl) for the a and b Parameters for Various . 
Equating and Calibrating Sample Sizes 



Mumbar of Sii 
C«INiratlon 


lUtett 

Equating. 


SIErrorl 


lErrorl 


^i' ° ■ 

' Sltrrorl 


lErrotI 


250 . ■ 
.5007 
1000 
2000 


2000 
2000 
2000 
2000 


34.^263 
• 15 J 282 
15.9871 
16.5958 


.4278 
.1891 
.1998 , 
.2074 


23.3679 
21.9342 
16.3660 
1^.4579 . . 


.2921 
. -2742 
.2046 
.1682 

> 


*250 

,400 
1000 
2000 


- 1000 
• 1000 
. 1000 
1000 


3§.3625 
T7.6788 
19.5867 
21.0321 . 


^\795 
.2210 
.2448 - 
;2629 


25.6440 
' . 24.3413 
19.1156 • 
16.8828 


.3205 
.3043 
.2389 
2110 


' 250 . J? 
5.00 A. 
1000 ^ 
2000 

250 . 

500 
1000 
2000 


J 500 
500^ 
V 500 ' 
\ 500 

' 250 
250 
, 250 • 
250 


48.61 12 
24.5582 
28.8291 
"31.2094 

44.3122 
21.5767 
24.4389 
27.0242 


.6076* • 
.3070 
.3604 . 
.3901 

. .5539 
* -.2697 
.3117 • 
.3378 


25.4374 
/ 22.8994 
18.1871 
. -15.83.28 

19.4843 
lf.3255 


.3180 
.2862 
.2273 
.1979 

.3275 
.3052 
.2436 
.2166 




rafr/e 7. Intercomlations, Means, 
^ and Standard Deviation of the Estimated 

a Pteameten* for Test 2 

I 2 3 4 



1 


1.000 








2 


.757 


1.000 






3 


• .696 , 


£60 


. 1.000 




4 ^ 


.595 


.803 


♦.926 


1.000 


Mean 


1.3525 


1.2539 ' 


1 .2348 


1.2268 


SP 


.4843 


.3347 


.3254 


.3061 



'Viairiables are for the four sample dzes: 250; 500; 1,000; 
2,000. 



<le 8. Intercomlations; Means, and » 
Standard Deviation of the Estimated 
/i b l»animetert" for Test 2 ^ . / 





1^ . 


2 


3 






i 

2 

. 3 „ 
4 


i.oo • 

.952 
.940 
.935 


1.00 
,978 
.969 


1.00 
.986 


1.00 




Mean 
SD 


.0563 


.0591 


..0735 


})559/ 




.8558 


.8384 


.8700 
♦ 


.8727/ 





^Vari^Ww are for Ae four sample sizes: 250; 500; 1,000a 

2,000. / 

/ 



estimated b parameters from ^mple to Sample. However, with/ 500 subjects irt the equating 
sample, a similar anomaly is observed which may also he due to. normal assumptions and to 
rescaling; 



; ; IV. DISCUSSION . 

The results of the study present new evidence of the. critical int^relatio^nship bAweertritem ^ 
calibration and equating sample sizes and the valuiw. of ICC parameters. . ^ 

Ertimatihg and . Equating a ^ \ ' 

For the 16 combinations of calibration saiiiple sizes and equating sample sizes identified in 
Table 6, the least deviation of estimated a from . its khoAvn value occurred with an equating 
sample size of 2,000 and a calibration sample size of 500 7 As mentioned in the previous section, 
although the least error, between the .estimatfed ^d known A values was expected with a match of 
2,000 equating and 2,000 calibrating sample sizes, the ei^ actually increased very slightly with 
increasing, calibration sample sizes beyond SDO. This /discrepancy apparently resuhs frdm a 
.non4inear transformation with sample sizes of 250 and 500 but tends toward linearity with sample 
sizes of 1,000 and 2,000., / 

During equating procedures, a sample size > pOO should be developed to ensure an 
acceptable degree of confidence that the estimation of a does not significantly depart from its 
"true" value. In the saniie light, estimation of a sufferi considerably using equating sample sizes of 
less than 500 such that equating samples of 1,000 or 2,000 are hijghl^ desirable to minimize error 
in estimating a. ; 

. • ■ ^ ' • . . / . 

Estfanating and Equating b > 

Table 6 also shows the linear relationship between error and sample size for the b 
parameter. The b parameter is best estimated with calibration and equating samples of 2,000 each, 
although a calibration Sample size of 1,000. with ar^ equating sample size of 500 can be tolerated 
without an appreciable increase in error. With " ^1 combinations of calibration and equating sample 
sizes» b is estimated quite well. 



Estimating aifa Equating c 

The flat line drawn in Figure 2, representing the data frorn Table 5, shows the estimation of 
the c parameter to be nearly insensitive to increases in jample size. As sample size increases from 
250 to 2,000 subjects, the error decreases but/ only^very slightly. With the c defined as the lower 
asymptote of the ICC and reprejsenting the probability of extremely low ability examinees 
correctly answering an item, the inability to estimate c with precision ^uld be disturbing. 
However, it has been pointed, out (Lord, 1975) that if a (9 - b) < -2, thence probability of a 
correct response is c. Therefore, if there are a large number of subjects with ability 6 $6 that 6 
<-<2/fl -6), c can be accurately estimated. If this requirement is not met, c will be poorly estimated. 

A stable and accurate estimate of the a and A parameters requires large numbers of subjects 
over a broad r^ge of ability. The estimation of c requires large numbers of subjects at very low 
ability levels. This^ holds for both equating and calibrating samples; therefore, it is necessary to 
administer test items, whether to be calibrated or equated, to the largest samples available, 
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